1 

Probabilistic Analysis of Linear Programming 

Decoding 

Constantinos Daskalakis^, Alexandres G. Dimakis^, Richard M. Karp^, and Martin J. Wainwright^'^ 



OO 

O 

o 



> 

O 
(N 
O 

o 

O 



X 



Abstract — We initiate the probabilistic analysis of linear pro- 
gramming (LP) decoding of low-density parity-check (LDPC) 
codes. Specifically, we show that for a random LDPC code 
ensemble, the linear programming decoder of Feldman et al. 
succeeds in correcting a constant fraction of errors with high 
probability. The fraction of correctable errors guaranteed by 
our analysis surpasses previous non-asymptotic results for LDPC 
codes, and in particular exceeds the best previous finite-length 
result on LP decoding by a factor greater than ten. This 
improvement stems in part from our analysis of probabilistic 
bit-flipping channels, as opposed to adversarial channels. At the 
core of our analysis is a novel combinatorial characterization 
of LP decoding success, based on the notion of a generalized 
matching. An interesting by-product of our analysis is to establish 
the existence of "probabilistic expansion" in random bipartite 
graphs, in which one requires only that almost every (as opposed 
to every) set of a certain size expands, for sets much larger than 
in the classical worst-case setting. 

Keywords: Error-control coding; channel coding; binary sym- 
metric channel; factor graphs; sum-product algorithm; linear 
programming decoding; low-density parity check codes; ran- 
domized algorithms; expanders. 



I. Introduction 

Low-density parity-check (LDPC) codes are a class 
of sparse binary linear codes, first introduced by Gal- 
lager [15], and subsequently studied extensively by various 
researchers [22], [23], [21]. See the book by Richardson and 
Urbanke [24] for a comprehensive treatment of the subject. 
When decoded with efficient iterative algorithms (e.g., the 
sum-product algorithm [20]), suitably designed classes of 
LDPC codes yield error-correcting performance extremely 
close to the Shannon capacity of noisy channels for very large 
codes [6]. Most extant methods for analyzing the performance 
of iterative decoding algorithms for LDPC codes — notably 
the method of density evolution [21], [23] — are asymptotic 
in nature, based on exploiting the high girth of very large 
random graphs. Therefore, the thresholds computed using 
density evolution are only estimates of the true algorithm 
behavior, since they assume a cycle-free message history. In 
fact, the predictions of such methods are well-known to be 
inaccurate for specific codes of intermediate block length (e.g., 
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codes with a few hundreds or thousands of bits). For this 
reason, our current understanding of practical decoders for 
smaller codes, which are required for applications with delay 
constraints, is relatively limited. 

The focus of this paper is the probabilistic analysis of linear 
programming (LP) decoding, a technique first introduced 
by Feldman et al. [10], [14] as an alternative to iterative 
algorithms for decoding LDPC codes. The underlying idea 
is a standard one in combinatorial optimization — namely, to 
solve a particular linear programming (LP) relaxation of 
the integer program corresponding to maximum likelihood 
(optimal) decoding. Although the practical performance of 
LP decoding is comparable to message-passing decoding, 
a significant advantage is its relative amenability to non- 
asymptotic analysis. Moreover, there turn out to be a number 
of important theoretical connections between the LP decoding 
and standard forms of iterative decoding [19], [31]. These 
connections allow theoretical insight from the LP decoding 
perspective to be transferred to iterative decoding algorithms. 



A. Previous work 

The technique of LP decoding was introduced for turbo- 
hke codes [10], extended to LDPC codes [11], [14], and 
further studied by various researchers (e.g., [28], [5], [12], 
[8], [13], [16]). Significant recent interest has focused on post- 
processing algorithms that use the ML-certificate property of 
LP decoding to achieve near ML performance (see [8], [4]) 
and also [9], [26]). 

For concatenated expander codes, Feldman and Stein [13] 
showed that LP decoding can achieve capacity; see also [1], 
[18]. For the standard LDPC codes used in practice, the best 
positive result from previous work [12] is the existence of a 
constant /? > 0, depending on the rate of the code, such that 
LP decoding can correct any bit-flipping pattern consisting 
of at most l3n bit flips. (In short, we say that LP decoding 
can correct a /^-fraction of errors.) As a concrete example, for 
suitable classes of rate 1/2 LDPC codes, Feldman et al. [12] 
established that [3 ~ 0.000177 is achievable. However, this 
analysis [12] was worst-case in nature, essentially assuming 
an adversarial channel model. Such analysis yields overly 
conservative predictions for the probabilistic channel models 
that are of more practical interest. Consequently, an important 
direction — and the goal of this paper — is to develop methods 
for finite-length and average-case analysis of the LP decoding 
method. 
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B. Our contributions 

This paper initiates the average-case analysis of LP decod- 
ing for LDPC codes. In particular, we analyze the following 
question; what is the probability, given that a random subset 
of an bits is flipped by the channel, that LP decoding 
succeeds in recovering correctly the transmitted codeword? 
As a concrete example, we prove that for bit-degree-regular 
LDPC codes of rate 1/2 and a random error pattern with 
an bit flips, LP decoding will recover the correct codeword, 
with probability converging to on^H for all a up to at least 
0.002. This guarantee is roughly ten times higher than the best 
guarantee from prior work [12], derived in the setting of an 
adversarial channel. Our proof is based on analyzing the dual 
of the decoding linear program and obtaining a simple graph- 
theoretic condition for certifying a zero-valued solution of the 
dual LP, which (by strong duality) ensures that the LP decoder 
recovers the transmitted codeword. We show that this dual 
witness has an intuitive interpretation in terms of the existence 
of hyperflow from the flipped to the unflipped bits. Although 
this paper focuses exclusively on the binary symmetric channel 
(BSC), the poison hyperflow is an exact characterization of LP 
decoding for any memoryless binary input symmetric output 
(MBIOS) channel. We then show that such a hyperflow witness 
exists with high probability under random errors in the bit- 
degree-regular LDPC ensemble. The argument itself entails 
a fairly delicate sequence of union bounds and concentration 
inequalities, exploiting expansion and matchings on random 
bipartite graphs. 

C. Probabilistic Expanders 

An interesting by-product of our analysis is the proof of the 
existence of probabilistic expanders — that is, bipartite graphs 
in which almost all sets of vertices of size up to an and their 
subsets have large expansion. One key point is that it is not 
sufficient to require a random subset of vertices to expand 
w.h.p., because we use the expansion combined with Hall's 
theorem to guarantee large matchings. What we need instead 
is that a random subset of vertices and all its subsets will 
expand w.h.p. which by Hall's theorem will guarantee that a 
random subset will have a matching. In effect, by relaxing 
the expansion requirement from every set to almost all sets 
of a given size, we show that one can obtain much larger 
expansion factors, and corresponding stronger guarantees on 
error correction. Our analysis relies on the fact that a random 
bipartite graph, conditioned on all the small sets having some 
expansion, will also have this probabilistic expansion for much 
larger constants a. This innovation allows us to go beyond 
the worst-case fraction of errors guaranteed by traditional 
expansion arguments [12], [25]. 

The remainder of the paper is organized as follows. We 
begin in Section HI] with background on error-control coding 
and low-density parity-check codes, as well as the method of 
linear programming (LP) decoding. Section [III] describes our 
main result and Section |IV] provides the proof in a series of 

'Note that our analysis yields a bound on the probability of failure for 
every finite block length n. 



lemmas, with more technical details deferred to the appendix. 
We conclude in Section |V] with a discussion. 

II. Background and Problem Formulation 

We begin with some background on low-density parity- 
check codes. We then describe the LP decoding method, and 
formulate the problem to be studied in this paper 

A. Low-density parity-check codes 

The purpose of an error-correcting code is to introduce 
redundancy into a data sequence so as to achieve error-free 
communication over a noisy channel. Given a binary vector 
of length k (representing information to be conveyed), the 
encoder maps it onto a codeword, corresponding to a binary 
vector of length n > k. The code rate is given by R = k/n, 
corresponding to the ratio of information bits to transmitted 
bits. In a binary linear code, the set of all possible codewords 
corresponds to a subspace of {0,1}", with a total of 2'' 
elements (one for each possible information sequence). The 
codeword is then transmitted over a noisy channel. In this 
paper, we focus on the binary symmetric channel (BSC), in 
which each bit is flipped independently with probability a. 
Given the received sequence from the channel, the goal of the 
decoder is to correctly reconstruct the transmitted codeword 
(and hence the underlying information sequence). 

Any binary linear code can be described as the null space of 
a parity check matrix H G {0, ijl"-*:) x"; more concretely, the 
code C is given by the set of all binary strings x £ {0, 1}" such 
that Hx = in modulo two arithmetic. A convenient graphical 
representation of such a binary linear code is in terms of its 
factor graph [20]. The factor graph associated with a code C 
is a bipartite graph G = (V, C), with n = \V\ variable nodes 
corresponding to the codeword bits (columns of the matrix H), 
and m = n — k — \C\ nodes corresponding to the parity checks 
(rows of the matrix H). Edges in the factor graph connect each 
variable node to the parity checks that constrain it, so that the 
parity check matrix H specifies the adjacency matrix of the 
graph. A low-density parity-check code is a binary linear code 
that can be expressed with a sparse factor graph (i.e. one with 
0(1) edges per row). 

Given a received sequence y G {0,1}" from the BSC, 
the maximum likelihood (ML) decoding problem is to deter- 
mine the closest codeword (in Hamming distance). It is well 
known that ML decoding for general binary linear codes is 
NP-hard [2], which motivates the study of sub-optimal but 
practical algorithms for decoding. 

B. LP decoding 

We now describe how the problem of optimal decoding 
can be reformulated as a linear program over the codeword 
polytope, i.e. the convex hull of all codewords of the code C. 
For every bit yi of the received sequence y, define its log- 
likelihood as 7i = log ^ , where yi represents the 
corresponding bit of the transmitted codeword y. Using the 
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memoryless property of the channel, it can be seen that the 
maximum likelihood (ML) codeword is 

n 

Vml = argmin Y]7j?/i. (1) 

■i— 1 

Without changing the outcome of the maximization, we can 
replace the code C by its convex hull conv(C), and thus express 
ML decoding as the linear program 

n 

yML = arg min ^'S^HVi- (2) 

yEconv(C) — 
i—\ 

Although we have converted the decoding problem from an 
integer program to a linear program, it remains intractable 
because for general factor graphs with cycles, the codeword 
polytope does not have a concise description. 

A natural approach, and one that is standard in operations 
research and polyhedral combinatorics, is to relax the linear 
program by taking only a polynomial set of constraints that 
approximate the codeword polytope conv(C). The first-order 
LP decoding method [14] makes use of a relaxation that results 
from looking at each parity check, or equivalently at each 
row of H, in an independent manner. For each check a G C 
in the code, denote by Ca the set of binary sequences that 
satisfy it — that is, Ca corresponds to the local parity check 
subcode defined by check a and its bit neighbors. Observe 
that the full code C is simply the intersection of all the local 
codes, and the codeword polytope has the exact representation 
coiiv(C) = conv(P|"^j^ Ca). The first-order LP decoder simply 
ignores interactions between the various local codes, and 
performs the optimization over the relaxed polytope given 
by P : = ni^i conv(CQ). Note that P is a convex set that 
contains the codeword polytope conv(C), but also includes 
additional vertices with fractional coordinates (called pseu- 
docode-words in the coding literature). It can be shown [31] 
that if the LDPC graph had no cycles and hence were tree- 
structured, this relaxation would be exact; consequently, this 
relaxation can be thought of as tree-based. 

In sharp contrast to the codeword polytope for a general 
factor graph with cycles, the relaxed polytope V for LDPC 
codes is always defined by a linear number of constraints. 
Consequently, LP decoding based on solving the relaxed linear 
program 

n 

^LP = argmin^gp ^ ^iyi, (3) 

i=l 

can solved exactly in polynomial time using standard LP 
solvers (e.g., interior point or simplex methods), or even faster 
with iterative methods tailored to the problem structure [11], 
[29], [30], [31]. 

For completeness, we now provide an explicit inequality 
description of the relaxed polytope V . For every check a 
connected to neighboring variables in the set N(a) and for 
all subsets S C N{a), \S\ odd, we introduce the following 
constraints 

E y^ + T.(^-y^)>^■ (4) 



Each such inequality corresponds to constraining the £i dis- 
tance of the polytope from the sequences not satisfying check 
a — the forbidden sequences — to be at least one. It can be 
shown that these forbidding inequalities do not exclude valid 
codewords from the relaxed polytope. We also need to add a 
set of 2n box inequalities — namely, < yi < 1 — in order to 
ensure that we remain inside the unit hypercube. The set of 
forbidding inequalities along with the [O,l]-box inequalities 
define the relaxed polytope. 

Note that, given a check a of degree dc, there are 2'^"'^ local 
forbidden sequences, i.e. sequences of bits in the check neigh- 
borhood N{a) that do not satisfy the check a. Consequently, 
for a constant check degree code, the total number of local 
forbidden sequences is 2'''=^^™, so that number of forbidding 
inequalities scales linearly in the block length n. Fortunately, 
in the case of low-density parity-check codes, the degree dc 
is usually either a fixed constant (for regular constructions) 
or small with high probability (for irregular constructions) so 
that the number of local forbidden sequences remains small. 
Overall, in the cases of practical interest, the relaxed polytope 
can be characterized by a linear number of inequalities in the 
way that we have described. (We refer the interested reader 
to [14], [32] for alternative descriptions more suitable for the 
case of large dc-) 

III. Main result and proof outline 

In this section, we state our main result characterizing the 
performance of LP decoding for a random ensemble of LDPC 
codes, and provide an outline of the main steps. Section |IV] 
completes the technical details of the proof. 

A. Random code ensemble 

We consider the random ensemble of codes constructed 
according to the following procedure. Given a code rate R G 
(0, 1), form a bipartite factor graph G = (V, C) with a set of 
n = \V\ variable nodes, and m = \C\ = [(1 — i?)7ij check 
nodes as follows: (i) Fix a variable degree dy G N; and (ii) For 
each variable j G V, choose a random subset N{j) of size 
dy from C, and connect variable j to each check in N{j). 
For obvious reasons, we refer to the resulting ensemble as the 
bit-degree-regular Tandom ensemble, and use C{dy) to denote 
a randomly-chosen LDPC code from this ensemble. 

The analysis of this paper focuses primarily on the binary 
symmetric channel (BSC), in which each bit of the transmitted 
codeword is flipped independently with some probability a. 
By concentration of measure for the binomial distribution, it 
is equivalent (at least asymptotically) to assume that a constant 
fraction an of bits are flipped by the channel. Let P denote 
the joint distribution, over both the space of bit-degree-regular 
random codes, and the space of an bit flips. With the goal of 
obtaining upper bounds on the LP error probability P[LP fails], 
our analysis is based on the expansion of the factor graph 
of the code. Specifically, the factor graph of a code with 
blocklength n is a {fj,,p)-expander if all sets S of variable 
nodes, of size |5| < fin, are connected to at least p\S\ checks^ 

-Throughout this paper, we work with codes with simple parity check 
constraints (LDPC codes) which are different from the generalized expander 
codes [25], [13] that can have large linear subcodes as constraints. 



4 



B. Statement of main result 

Our main result is a novel bound on the probability of 
error for LP decoding, applicable for finite block length n 
and the bit-degree-regular LDPC ensemble. The main idea is 
to show that, under certain expansion properties of the code, 
LP decoding will succeed in recovering the correct codeword 
with high probability. We note that a random graph will have 
the required expansion properties with high probability. 

In particular, we show that for the joint distribution over 
random expander bit-degree-regular codes and [acn] (or less) 
bit flips by the channel, there exists a constant ac, depending 
on the expansion properties of the ensemble, such that LP 
decoding succeeds with high probability. More formally. 

Theorem 1: For every bit-degree-regular LDPC code en- 
semble with parameters i?, , n, we specify quadruples 
(ttcjC, for which the LP decoder succeeds with high 
probability over the space of -expander bit-regular ran- 
dom codes and at most \acn] bit flips. The probability of 
failure decreases exponentially in c — namely 

P[ LP success I C{dy)is an {fJ-,p) expander] > 1— e^*^". 

(5) 

We note that any factor graph sampled from the bit-regular 
ensemble will be an expander with high probability. In general, 
the fraction of correctable errors ac guaranteed by Theorem [T| 
is a function of the code ensemble, specified by the code rate 
R, the bit degree dy, its expansion parameters /i and p and 
the error exponent c. For any code rate, the maximum fraction 
of correctable errors ac achieved by our analysis is provably 
larger than that of the best previously known result [12] for LP 
decoding, which guaranteed correction of a fraction f^rf M of 
errors. As a particular illustration of the stated Theorem[Tl we 
have the following guarantee for rate R ~ codes: 

Corollary 1: For code rate R ~ ^, bit degree dy = 8 
and error fraction a G (0,0.002), the LP decoder succeeds 
with probability 1 — o(l) over the space of bit-degree-regular 
random codes of degree dy and \an~\ bit flips. 
More generally, for any code rate R, our analysis in SectionHVl 
(see discussion following Lemma [8]l specifies conditions for 
the bit flipping probability ac and the expansion parameters 
l^i and p so that the condition ^ is satisfied with a suitable 
choice of error exponent c. 

C. Outline of main steps 

We now describe the main steps involved in the proof of 
Theorem [T] 

1) Hyperflow witness: As in previous work [12], we prove 
that the LP decoder succeeds by constructing a dual witness: 
a dual feasible vector with zero dual cost, which guarantees 
that the transmitted codeword is optimal for the primal linear 
program. Using the symmetry of the relaxed polytope, it can be 
shown [14] that the failure or success of LP decoding depends 
only on the subset of bits flipped by the channel and not on the 
transmitted codeword. Consequently, we may assume without 
loss of generality that the all zero codeword was transmitted. 
Moreover, note that, for the binary symmetric channel (BSC) 
with flip probability e, the log-UkeUhood of each received bit 



is either log {■^-^) or — log {■^-^)- Since the optimum of the 
primal is not affected by rescaling, we may assume without 
loss of generality that all ji are either 1 or — L Then, every 
flipped bit i will be assigned 7; — 1, whereas every unflipped 
bit 7i = 1. Under these assumptions, Feldman et al. [12] 
demonstrated that a dual witness can be graphically interpreted 
as a set of weights on the edges of the factor graph of the code: 

Lemma 1 (Dual witness [12]): Suppose that all bits in the 
set F are flipped by the channel, whereas all bit in the 
complementary set i^^ := V" \ F are left unchanged. Set 
7i = —1, for all i G F, and 7^ = 1, for all i E F'^. Then linear 
programming (LP) decoding succeeds for this error pattern 
if there exist weights Tia for all checks a G C and distinct 
adjacent bits i G N{a) such that the following conditions 
hold: 

Tia + Tja > V chccks a G C, and (6a) 
V adjacent bits i,j e N{a). 

na < J^ VieV. (6b) 

aGAf(i) 

We next introduce a sufficient condition for the success 
of LP decoding, one which is equivalent but arguably more 
intuitive than the dual witness definition: 

Definition 1: A hyperflow for 7 is a set of edge weights r^j 
that satisfy condition (|6b] i and moreover, have the following 
additional property: for every check j g C, there exists a 
Vj > such that for exactly one variable i e N{j), Tij = —Vj 
and for all the other i' £ N{j) \ {i}, = Vj. 

The flow interpretation is that each check corresponds to 
a hyperedge connecting its adjacent variables; the function 
of any check is to replicate the flow incoming from one 
variable towards all its other adjacent variables. With this set- 
up, condition ( l6b] i corresponds to the requirement that all the 
variables i with 7; < need to get rid of at least — 7^ units of 
"poison", whereas each variable i with 7^ > can absorb at 
most 7i units of "poison". Figure [T]illustrates a valid hyperflow 
for a simple code. 

We claim that the existence of a valid hyperflow is equiva- 
lent to the dual witness: 

Proposition 1: There exists a weight assignment r^j satis- 
fying the conditions of Lemma [T] if and only if there exists a 
hyperflow t[,^ for 7. 

See Appendix lAl for a proof of this claim. 

2) Hyperflow from {p,q) matching: Let N{F) denote the 
subset of checks that are adjacent to the set F of flipped bits. 
One way to construct a hyperflow for 7 is to match each bit 
i in the set F of flipped bits, with some number of checks, 
say p < dy checks, to which it has the exclusive privilege to 
push flow, suppose in a uniform fashion. This follows because 
in a matching each check is used at most once. Let us refer 
to the checks that are actually used in such a matching as 
dirty, and to all the checks in N{F) as potentially dirty. The 
challenge is that there might be unflipped variables that are 
adjacent to a large number of dirty checks, and hence fail to 
satisfy condition ( |6b] i; i.e. they receive more flow than they 
can absorb. Thus, the goal is to construct the matching of 
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Fig. 1. Example of a valid hyperflow: bits xi and X4 have 
been flipped in a binary symmetric channel, and hence 
each are contaminated with one unit of poison. Each of the 
unflipped bits can absorb up to one unit of poison, whereas 
the checks act as hyper-edges and replicate any incoming 
flow in all directions other than the incoming one. The valid 
hyperflow shown in this figure certifies that LP decoding can 
correct these two flipped bits. 



the flipped bits in a careful way so that no unflipped bit has 
too many dirty neighbors. The (S-matching witness, used by 
Feldman et al. [12], avoids this difficulty by matching all of 
the bits adjacent to potentially dirty checks with 6 = p checks 
each. Our approach circumvents this difficulty using a more 
refined combinatorial object that we call a {p, (7)-matching. For 
each bit j e F", let Zj : = \N{j) n N{F)\ be the number of 
its edges adjacent to checks in N{F). 

Definition 2: Given non-negative integers p and q, a 
(p, q)— matching is defined by the following conditions: 

(a) each hit i ^ F must be matched with p distinct checks, 
and 

(b) each bit j e F""' must be matched with 



Xj : = max{(7 — dy + Zj , 0} 



(7) 



distinct checks from the set N{F). 
In all theoretical analysis in this paper, it is technically 
convenient to consider only pairs (p, q) such that 

p > q, 2p + q > 2dv, and dy > p + 2. 

(The lone exception is Figure |2l which is shown only for 
illustrative purposes.) 

We refer to the number of checks with which each variable 
node needs to be matched as its request number. In this 
language, all flipped bits have p requests while each unflipped 




N{F) 

Fig. 2. Illustration of a (p, g) -generalized matching with 
p = 2, q = 2, and ci„ = 4. The first three bits are flipped, and 
form the poisoned set S\ each flipped bit must be matched 
with p = 2 checks from its neighborhood (edges drawn with 
arrows). The bit node labeled 2 lies in S"^: it connects to 
Z2 = \N{2) n N{F)\ = 3 checks within the set N{F), 
and so must be matched with X2 = q — {dv — 3) — 2 
checks from N{F) (two incoming arrows). By construction, 
bit 2 then has 2 + 1 = q checks that are not contaminated. 
Similarly, bit 1 connects Zi = 2 checks from N(F), and so 
must be matched with Xj = q — (d„ — 2) = 1 check from 
N{F) (incoming arrow). It has a total of 1 + 2 = g checks 
that are not contaminated. 



bit j has a variable number of requests Xj which depends on 
how many of its edges land on checks which have flipped 
neighbors. The following lemma justifies why requests are 
selected in this way and illustrates the key property of the 
(p, (7)-matching: 

Lemma 2: A (p, (7) -matching guarantees that all the flipped 
bits are matched with p checks, and all the non-flipped bits 
have q or more non-dirty check neighbors. 
This fact follows by observing that any unflipped bit j with 
Zj edges in N{F) has dy — Zj clean neighboring checks, and 
requests q — {dy — Zj) extra checks from the potentially dirty 
ones. 

Figure|2]illustrates a generalized matching for a degree dy = 
4 factor graph, and {p,q) = (2,3). Note that the bit node 
labeled 2 has Z2 = 3 neighbors in the potentially dirty set 
N{F), and so it makes X2 = 3 — (4 — 3) = 2 requests for 
matching. This ensures that it is connected to 2 + 1 = q checks 
that are not dirty. A similar argument applies to bit 1, with 
Zi=2 and Xi = 1. 

We next claim that a (p, (7)-matching is a certificate of LP 
decoding success: 

Lemma 3: For any integers p and q such that 2p + q > 2dy, 
a (p, (7)-matching can be used to generate a set of weights Tia 
which constitute a hyperflow for 7 and, hence, satisfy the dual 
witness conditions (|6]l. 

Proof: Each flipped bit is matched with p checks: suppose 
it sends x units of poison to each of these checks. In the worst 
case, the remaining dy — p edges are connected to checks 
to which other flipped bits are sending poison. Therefore, 
each flipped bit (in the worst case) can purge itself of px — 
{dy — p)x units of its own poison, so that we require that 
PX - {dy -p)x > 1- 

By Lemma |2] each unflipped bits has at least q checks that 
do not send any poison. In the worst case, then, an unflipped 
bit can receive {dy — q)x units of poison, which we require to 
be less than 1. Overall, a vahd routing parameter x will exist 
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if 



2p- 



< 



, or equivalently, \f p + q> 2d^ as claimed. 



In fact, it can verified that our combinatorial witness for LP 
decoding success is easier to satisfy than the condition used 
by Feldman et al. [12]. Our use of this improved witness, 
along with our focus on the probabilistic setting, are the two 
ingredients that allow us to establish a much larger fraction 
ttc of correctable errors. 

3} From expansion to matching via Hall's theorem: The 
remainder (and bulk) of the analysis involves establishing that, 
with high probability over the selection of random expander 
bit-degree-regular codes and random subsets of \an\ flipped 
bits, a (p, -matching exists, for suitable values q < p io 
be specified later. It is well known [3], [25] that random 
regular bipartite graphs will have good expansion, with high 
probability: 

Lemma 4 (Good expansion): For any fixed code rate R G 
(0, 1), degree dy and p < dy — 2, there exist constants /i, c > 
so that a code C{dy) from the bit-degree-regular ensemble 
of degree dy is a (pn,p) expander with probability at least 

l-0(l/7l). 

Therefore, conditioned that the event that the random graph 
is an expander, the next step is to analyze the existence of 
a {p, q)- matching. We use Hall's theorem [27], which in our 
context, states that a matching exists if and only if every subset 
of the variable nodes have (jointly) enough neighbors in N{F) 
to cover the sum of their requests. 

Given our random graph and channel models, an equivalent 
description of the neighborhood choices for each variable 
j S F'^ is as follows. Each node j G F'^ picks a random 
, di, } according to the binomial distri- 
and picks a subset of N{F) of size 
Zj. This subset corresponds to the intersection of its check 
neighborhood N{j) with the check neighborhood N{F) of 
the flipped bits. The remaining dy — Zj edges from bit j 
connect to checks outside N{F). With this set-up, we now 
define the a "bad event" £, defined by the existence of a pair 
{Si, S2) G 2^ X 2^ of sets that contracts, meaning that it has 
more requests than neighbors, so that 

|7V(^i)U[Ar(^2)nAr(F)]| < 

^max{0,g-(rf„-Z,)}}. (8) 

jeS2 

Notice that only the neighbors in N{F) are counted, since a 
{p, (7)-matching involves only checks in N{F). By Lemma[3] 
the event £ must occur whenever LP decoding fails so that 
we have the inequality P[LP decoding fails] < P[£]. Defining 
the event 



number Zj G {0, 1, 
bution Bm{dy, 



\N{F)\ 



g{dy,^j,,p) 



{C{du) is a {^in,p) expander} , (9) 



we make use of the following conditional form of this inequal- 
ity: 



LP decoding fails | Q] 



< 



I 



It is useful to partition the space 2^ x 2^ into three subsets 
controlled by the parameters e2,M > 0. Parameter £2 > is a 
small constant to be specified later in the proof and /i is the 



IS2IA 



n-\an\ 





Li 


U 


Al 








A 


2 


A3 







Ejti fin Scritn \an\ |sj 

Fig. 3: Partitioning the space 2^ x 2^" . 



expansion coefficient. The three subsets of interest are given 
by 



A2 
A3 



{{81,82) I {81,82) e A, \8l\ + \82\<^ln}(m 
{{81,82) I {8i,82)eA-Ai, >e2n},(10b) 
A\{AiUA2). (10c) 



This partition, as illustrated in Figure [3] decomposes £ into 
sub-events 



£{A) := {3{Si,82)eA, 



equation (|8]l holds 



for i = 1,2,3. Then, via a series of union bounds, we have 
the following upper bound on the probability of failure 

P[ LP fails \g] < ¥[£ \g] 

< ^P[f(A,) I g]. 

i=l 

However, all subsets of variable nodes of size at most /in in 
a {fin,p) expander have a p-matching and, because q < p, it 
follows that 



^(^i) I G] 



0. 



(11) 



Consequently, we only have to deal with the remaining two 
terms of the summation for i = 2 and z = 3. Before 
proceeding, an important side-remark here is that equation ( fTTI ) 
by itself implies that the LP decoder can correct a constant 
fraction of errors; indeed, it is precisely this observation that 
was exploited by Feldman et al. [12]. However, our ultimate 
goal in this paper is to establish higher fractions of correctable 
errors, so we need to continue our analysis further. 
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For i = 2, 3, we have 

p[f(AO I g] = 



g(A,)Ag] 

^ p[g(AO] 

< 2P[f(A,)], 



(12) 



where the last inequality follows from Lemma |4] Overall, 
putting everything together, we conclude that 



LP fails 



< 



2j2n£iA)]- (13) 



1=2 



The remainder of the proof consists of careful analysis of 
these two error terms. It turns out to be convenient to use an 
alternative probabilistic model in the analysis. In particular, 
observe that there is an inconvenient asymmetry in the defi- 
nition of our generalized matching: the bits of set F''^ need to 
be matched with checks from the neighborhood of the flipped 
bits F, and not from the whole set of checks from which they 
select their neighbors. This correlation between N{F) and the 
number of requests from set F'^ creates severe complications 
in the analysis. Indeed, any attempt to use Hall's condition 
through union bounds seems to require independence among 
different edges; moreover, crude upper-bounds on the number 
of requests from set F'^ seem inadequate to decorrelate the 
requests of F"^ from the size of N{F). For this reason, 
we use an alternative probabilistic model, as described in 
Section HV^ 

IV. Proof of Theorem[T] 

We now turn to the remaining (somewhat more technical) 
steps involved in the proof of Theorem [T] 

A. Simplifying the probability model 

In order to decouple the distribution of the requests of F'^ 
from the size of N{F), observe that the number of requests 
Xj from each bit j in F^ grows linearly with the number 
of edges that this bit has in N{F). Notice that the checks 
are selected with replacement and the degree of a variable 
can be strictly smaller than d^, although this will not be 
an issue asymptotically. This observation combined with a 
coupling argument shows that, if x,x' £ {0, . . . , dy}^^'^ are 
two vectors of requests from the bits in F'^, where x < x' 
elementwise, then the probability that a {p, (7)-matching exists 
is larger conditioned on x than on x' . 

This observation suggests the following alternative experi- 
ment: 

• A node j £ F'^ first picks a random number 
Zj G {0, 1, . . . , d„} according to the modified binomial 
distribution Bin {^d^, L""^ ) ■ 

• Node i then chooses Zj checks from N{F) with replace- 
ment. 

This procedure is repeated independently for each j E F'''. 
Since |iV(i^)| < dy[an], the bits of set F" wifl tend to 
have more edges in N{F) and, therefore, more requests in 



this new experiment than in the original one (as suggested by 
the natural coupling between the two processes). Moreover, 
since checks are now chosen with replacement, for each bit 
j £ F'^, the size of the intersection N{j) D N{F) is less 
than or equal to Zj, since the same check might be chosen 
more than once. Intuitively, the existence of matchings is 
less likely in the new experiment than in the original one; 
this claim follows rigorously by combining these observations 
with the coupling argument used in the previous paragraph. 
The benefit of switching from the original experiment to this 
new experiment is in allowing us to decouple the process of 
deciding the number of requests made by each bit in F'^ from 
the cardinality of the random variable N{F). 

Let us use Q to denote the probability distribution 
over random graphs in this new model. Setting F^{q) = 
{j E F^ I q> dy — Zj}, we can define the alternative "bad 
event" B, meaning that there exist 5*1 C F, and 5*2 C F'^{q) 
such that 

\N{Sl)\J[N{S2)r^N{F)]\<p\Sl\+Y, [q-{dy-Z,)]. 

(14) 

In addition, we define the corresponding sub-events B{Ai) for 
i = 1,2,3. As argued above, it must hold that 

¥[£{A,)]<Q[BiA,)], for all i = 1,2,3, 

and, therefore, as inequality (fT3] l suggests, in order to upper 
bound the probability of LP decoding failure, it suffices to 
obtain upper bounds on the probabilities Q[B(yli)] for i = 2,3. 

For future use, we define for fixed subsets 5i C F and ^2 C 
F^^q), the event B{Si, S2) that equation (O holds for Si and 
S'2. We now proceed, in a series of steps, to obtain suitable 
upper bounds on the probabilities Q[B(Ai)] and, hence, on the 
probability of LP decoding failure. 

B. Conditioning on requests from F'^ 

For each i £ {1, . . . , (7}, we define the random variable 



{j £ F^ I Zj^dy - {q-i)} 



(15) 



\an'\ dy 



corresponding to the number of bits in F'^ with dy — [q — i) 
edges that lie inside the "contaminated" neighborhood N{F) 
and, hence, with i requests each. Note that each Yi is binomial 
with parameters [{1 — a)n\ and 

^ ._ ( '^i' \ / \an\dy V^"'^'^^'^ / 
\dy - q + ij \ m J \ 

(16) 

Since K[Yi] = bi[{l — a) n\, applying Hoeffding's inequal- 
ity [17] yields the sharp concentration 

Q[\Y,-b,[{l-a)n\\>ein] < 2exp{-2ein) 

for any ei > 0. Hence, if we define the event 

? 

^(ei) : = nil^*-^*L(l-«)"JI <ei^^}, (17) 

i=l 

then a simple union bound yields that 

(ei)] >l-2q cxp {-2ejn) , 
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SO that it suffices to bound the conditional probabilities 
Q[B{Ai) I T(ei)], i ^ 2,3. Note that conditioned on the 
event T{ei), we are guaranteed that 

Y, 



(18) 



The following lemma allows us to restrict our attention to 
linearly-sized check neighborhoods N{Si) in analyzing the 
individual terms E{-ji,si) of the summation; the proof is 
provided in Appendix ID] 

Lemma 6 (ji small): Define the critical point 7crit(si) 



We now turn to bounding the probability of the bad event 
B. Since, by symmetry, the probability of the event B{Si, S2) 
is the same for different sets of the same size, a union 
bound gives QiBiA^) \ T(ei)] < Elr"rl.nl where 



sup < 71 e (0, dySi] I 2 + dySi log2 



71 



< 



\an~\ 
si 



3^2 C F^{q) with (^1,^2) G A2 s.t. ^(^1,^2) T(ei) 



(22) 

Then, for set sizes si > \e2n\ and neighborhood sizes 
71 < 7crit(e2)"-, the quantity i?(7i,si) decays exponentially 
fast in n. 

Note that the supremum ( |22] | is always finite. This lemma 
essentially says that if si has linear size, its neighborhood 71 
must also have linear size. 



with 5*1 is any fixed set of size si. 

Before bounding these terms, we first partition the values 
of si into two sets { [e2"-] , • • • , [scrit'^l } and + 
1, . . . , \mi\ } for some value of Sdit to be specified formally 
in Lemma |5] To give some intuition, in the conditional space 
T(ei), the total number of matching-requests from the bits of 
set F'^ is at most 



V 



i=l 



(19) 



Therefore, if Q[B{A2) \ T{ei)] is indeed small, we would 
expect that, if the set 5*1 is large enough (say \Si\ « 
then with high probability, the size of its image N{Si) should 
be large enough not only to cover its own requests but also 
V additional requests — viz. lA^lS*!)] > p\Si\ + V. If this 
condition holds, then there cannot exist any set 52 such that 
the event B{Si, S2) occurs. We formalize this intuition in the 
following result, proved in Appendix ICl 
Lemma 5 (Upper Regime): Define v 
the function 



aH 



-{1-R)H 



ps 



(l-R) 

where H{-) is the binary entropy function, and set 

Scrit : = min{a, inf {s G [0, a 

Then for all si G { [scrit»^l + 1, . . . , \an] }, the quantity D{si) 
decays exponentially fast in n. 
It remains to bound D{si) for si G L/, where 



Li ■■ = {[£2"-!, • ■ • , fScrit"-!}- 

For a randomly chosen set Si, define the event 

^(si,7i) :={|5i| = si, |7V(5i)| =71}. (21) 

By conditioning, we have the decomposition D{si) ~ 
J2'^l=iEin,si), where 



To summarize our progress thus far, we first argued that in 
order to bound the probability Q[B{A2) \ T(ei)], it suffices 
to bound the quantities D{si), for si G {[e2n], . . . , [an]}. 
Next we partitioned the range of si into two sets: the lower 
set L/ = { \e2n] , ■ ■ ■ , [scrit?^] }, and the upper set Ui : ~ 
{ fscrit"-] + 1, . . . , [an] }. The upper set has the property that 
for all sets 5*1 C F of size l^il G Ui, with high probability 
the neighborhood N{Si) is big enough to accommodate not 
only the matching requests from set 5*1, but also all possible 
matching-requests from any set S2 F'^. Having established 
this property of large Si sets, it remains to focus on small 
Si. In this regime, the neighborhood N{Si) on its own is no 
longer sufficient to cover the joint set of requests from 
and from any possible set S2 Q F""'. Consequently, one has 
to consider for every choice (5*1, S2) S A2, whether the joint 
neighborhood N{Si) U (^^(5*2) n N{F)) is large enough to 
cover the matching requests from Si and 6*2. 

At this point, one might imagine that a rough concentration 
argument appHed to the sizes of N{Si) and N{S2) n N{F) \ 
\ N{Si) would suffice to complete the proof. Unfortunately, any 
^Y~~R) J Concentration result must be sufficiently strong to dominate the 
factor (^""^) that leads the expression D{si). Consequently, 
we study the exact distribution of the size of (^i ), and bound 
f{s') < 0,Vs' G [s,a]}} . (20^s quantities £:(7i, si) for si G Li and 71 G {1, . . .,dvSi}. 

Of course, since si is linear in size, the bulk of the probability 
mass is concentrated on linear values for 71. Therefore, by 
Lemma |6] we need only bound E{ji,si) for si G i/ and 
71 > 7ciit(e2)"^- We complete these steps in the following 
subsection. 



-EL 



* y.. 



up 



and 



ps 



E{-fi,si) 



[an] 
si 



nF{si,ji)]x 



3 S2 with {Si,S2) G A2 s.t. B{Si,S2) T{si,ji) 

Here Q' denotes the conditional probability distribution of ' 
conditioned on the event T(ei). 



C. Completing the bound 

Let us fix sizes si G Lj and 71 > 7crit(e2)'T- For a 
set Si of size si with neighborhood N{Si) of size 71, 
define its residual neighborhood to be the set N{F)\N{Si) 
and use 72 := to denote its size. Moreover, 

define the vector of request^ y G (8)f^i{0, .., [y"^n]}, let us 
denote by /3(si, 71, y) the number of checks missing from the 
neighborhood of to cover the total number of requests from 
Si and a set ^2 with configuration of requests y. Also, let v{y) 

^Recall that we have conditioned on the event T(ei), so that the number of 
bits in F'^ with i matching requests is concentrated, for every i S {I, . . . , q}. 



9 



be the number of edges from 5*2 to N{F). More precisely, the 
quantities P{si,ji,y) and i'{y) are given by the formulae 



/3(si,7i:2/) psi~"fi + 2_^iyi (23a) 

i=l 

Q 

^iy) ■= ^{dv ~ q + i)yi- (23b) 

Note that for si G L/ and 71 > 7crit(e2)'T^, the quantity (3 
also grows linearly in n; as usual, we use /3 to denote the 
rescaled quantity (3/n. Also recall the definition of y^^ from 
equation ( fTSl ). 

Letting y : = {yi, . . . ,yq) be a vector of request fractions 
in [0, ll'', we define 



G(si,7i,72,y) := ^ min{0, Gfe(si, 71, 72, y)} 
where 

/ yt 



G2 := 
G3 := ((l-i?)-7i)i^ 



72 



+(i„(a - si) lo, 



((l-i?)-7i) 
71 + 72 



G4 := 72-ff 



min{72,/3(si,7i,y)} 



72 



-v{y) log2 



71 +min{72,/3(si,7i,y)} 
71 + 72 



With these definitions, we have the following result: 

Lemma 7 (Exponential upper bound): Suppose that the 
following inequaUties hold: 

Scrit < (25a) 
ad, < ^25b) 
aH (^^) ™ .-„it) log2 (^^) < 0. (25c) 
Then for some c > 0, we have the exponential upper bound 

Q[Z?(A2) I T(ei)] < 2"^("' + exp(-cn) 
where the function in the exponent is given by 

F{a) := _ _sup _ G(si, 71, 72, yi, . . . , (26) 

si,7l.72,{yi} 

with the maximization over 

Si G [0, Scrit] 

71 G [0, di,si] 

72 e [0, dt, (a - si)], and 



See Appendix |E] for a proof of this lemma. 

It remains to upper bound the probability of the bad- 
event ^(^3) which is equivalent to the existence of a pair 
of contracting sets (5*1, 5*2), where the size of set 5*1 C F 
is at most e2?i and the size of set S2 C is at least 
(yU — £2)7^- Note that we haven't yet specified the constant £2- 
The following lemma establishes that there exists a value of 
£2 so that Q[B{A3) \ T{ei)] is bounded by an exponentially 
decreasing function in n provided that the function F{a) from 
equation ( |26] | is negative. The proof of this final lemma is 
provided in Appendix |F] 

Lemma 8: If F{a) < 0, then there exists £2 so that the 
probability Q[B{A^) \ T{ei)] is decreasing exponentially in 
n. 

We may now complete the proof of Theorem [T] For a given 
rate R, fix the bit degree dy and the matching parameters {p, q) 
such that 

2p + q> 2dy, p>q, and dy -p>2, (27) 

and recall the definition ( l20b of Scrit- Suppose that the three 
inequalities ( l25b hold, and that the function F defined in 
equation (l26l l satisfies 



F{a) < 



(28) 



Then the probability 



P[ LP decoding fails | C{dy) is a {^m,p) expander ] 

decays exponentially in n, where P is the uniform distribution 
over the set of bit-degree-regular codes of degree dy and 
selections of \an \ bit flips. 

In particular, these explicit conditions allow us to investigate 
fractions of correctable errors on specific code ensembles. As 
a concrete example, for code rate i? = 1/2, if we choose 
variable degrees dy =% and generalized matching parameters 
q) = (6, 5), one can numerically verify that the condi- 
tions (|27li, (l28]l and dH) are satisfied for all a < acrit = 0.002. 
Therefore, for that rate, we establish a fraction of correctable 
errors which is more than ten times higher than the previously 
known worst-case results, as claimed. 

V. Conclusion 

The main contribution of this paper is to perform prob- 
abilistic analysis of linear programming (LP) decoding of 
low-density parity-check (LDPC) codes in the finite-length 
regime. Specifically, we showed that for a random LDPC code 
ensemble, the linear programming decoder of Feldman et al. 
succeeds (with high probability) in correcting a constant frac- 
tion of errors that surpasses all prior non-asymptotic results for 
LDPC codes. For a rate 0.5 code, it exceeds the best previous 
finite-length result on LP decoding by a factor greater than 
ten. Despite these substantial improvement, it should be noted 
that our analysis still yields very conservative results, roughly 
a factor of 50 lower than the typical empirical performance of 
these codes, as well as the associated asymptotic thresholds. 

Perhaps more important than specific numerical improve- 
ments over past results are the technical innovations that under- 
lie our analysis: a direct treatment of the probabilistic nature 
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of bit-flipping channels (as opposed to adversarial analysis 
in previous work), and a novel combinatorial characterization 
of LP decoding, based on the notion of a poison hyperflow 
witness. This hyperflow perspective illustrates that the factor 
graph defining a good code should have good flow properties, 
in the sense that no matter which subset of bits are flipped, 
the poison associated with errors can be diffused and routed 
to the unflipped bits. For more general MBIOS channels, the 
amount of poison corresponds exactly to the negative log- 
likelihood that the channel is assigning to each bit, and the 
same characterization of LP decoding holds. 

This intuition suggests that the property of supporting 
sufficient hyperflow could provide a useful design principle 
in the finite-length setting, for example small sets of variables 
which contract (are jointly adjacent to few checks) will cause 
pseudocodewords of small pseuodoweight. 

There are a number of ways in which specific technical 
aspects of the current analysis can likely be sharpened, which 
await further work. In addition, it remains to further explore 
the consequences of our analysis technique for other channels 
and code ensembles, beyond the particular LDPC ensemble 
and binary symmetric channel considered here. 
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Appendix 

A. Proof of Proposition |7] 

One direction of the claim is immediate: given the weights 
Tj'j of any hyperflow, they must satisfy condition (l6b] i by 
definition, and moreover it is easy to see that condition (l6al i 
will be automatically satisfied. In the other direction, we 
transform the edge weights to new weights t^^ that satisfy 
the hyperflow constraints. For each check j separately, we 
replace the weights on the adjacent edges with new weights 
that satisfy the hyperflow constraints and, at the same time, do 
not violate any of the constraints in condition ( |6b] i. Consider 
a check j and order the weights on the adjacent edges in 
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increasing order. Assuming that the check has degree d{j), 
consider the following cases: 

Case I: < tij < T2j ■ ■ ■ < Tj^{j)j- In this case, set 
Tj'j = for all i. The new weights are clearly hyperflow 
weights and, moreover, it is not hard to verify that none of 
the conditions (l6bb are violated by the transformation. 
Case II: < < T2j ■ ■ ■ < T^(f)y Set Pj = -ry, 
= -Vj and r/,^- = Vj,W £ N{j) \ {1}. This is a 
hyperflow weight assignment by construction. Observe, also, 
that none of the conditions (l6bl i for the variables in N{j) 
are violated by this transformation: indeed, for each variable 
k G we have that Tkj > —tij since the weights Ty 

satisfy (l6al l: therefore, setting t^j = —tij only makes the sum 
of the edges adjacent to variable k smaller, and the sum was 
already satisfying condition (l6bT i before the transformation. 

To conclude the claim, notice that at most one edge adjacent 
to every check j can have negative weight in assignment ; 
otherwise, condition dSbt would be violated for that check. □ 



B. Elementary bounds on binomial coefficients 
For each /? G (0, 1), define the binomial entropy 

Hif3) := -/31og2/3-(l-/3)log2(l-/3), 

with H{0) = ^^(1) = by continuity. We make use of 
standard asymptotics of binomial coefficients: for all integers 
k in the interval [0,n], we have 



iiogn = i7(^)±o(i) 

n \k / n 
as n tends to infinity (e.g., see Cover and Thomas [7]). 



(29) 



either condition on £{si) or its complement to obtain that 
D{si) is upper bounded by 



'[3{SlS2)eA2 

s.t. B{Sl,S2)\£'{si)]+Q[£isi)]\, 




which is equal to P{si) because, as argued in Section HV-BI 
conditioned on the event iS''^(si), there can be no 5*2 such that 
the event B {81,82) holds. 



D. Proof of Lemma |6| 
We have the bound 



where we have used the fact that the event {|A^(5i)| = 71} 
is independent of T{ei) under the probability 
distribution Q. An exact computation yields that 

ilog{Q[|7V(5i)| =71 I l^iHsi]} is upper bounded 
by 

1, //L(l-i^)nJ\/ 71 ^"'"'^ 
n ^\{ 71 J \[il-R)n\ 

which is in turn upper bounded by 



C. Proof of Lemma \5\ 

Note that conditioned on the event T(ei), we are 
guaranteed that X]?=i — Letting 81 be the 

fixed subset {l,...,si}, define the event £{si) : = 
{\N{Sl)\ < psi + vn]} and the quantity 



^(^1) 



an 

Sl 



Q[^(si)], 



Using the nature of the bit-regular random ensemble, we have 

'\an'\\ /[(I - R)n\\ f [psi + vn\ ^ 



Sl 



[psi +vn\J \ [(1 — R)n\ 



Setting Sl = ^ and using standard bounds on binomial 
coefficients ( [29] l, the quantity - \ogP{si) is upper bounded 
by 



R 



_ {psi +v) 1 
dvSl log2 — ^ + o(l)J . 



(1-i?) 

Defining the function / and value Scrit as in the lemma 
statement, we are guaranteed that P{si) decays exponentially 
in n for all si G {[scrit^l + li ■ • ■ 1 fo^'^l}- To complete the 
proof of the claim, we claim that D{si) can be upper bounded 
by P{si). Indeed, for si G { [scrit^-l + 1, . . . , [an] }, we can 



dv — log2 



n V L(l - R)n\ 




0(1), 



where we have used standard bounds on binomial coeffi- 
cients (|29]l. Overall, we have 



1. 
n 



H{ 


f Sl 






\ \an~\ 


24 


- dy — 




n 



7i 



L(l-i?)nJ 



since a,R € (0, 1), and each entropy term remains bounded 
within [0, 1]. 

Finally, setting si — si/n and 71 = Ji/n, consider the 
function 



?(7i) := 2 + <i„silog2 



71 



We have lim-^^Q+ (7(71) = — cx), implying that E{-fi,si) 
decays exponentially fast in n for all si > [£2'^] ™d 
neighborhood sizes 71 < 7crit(e2)^, where 7crit(0 is defined 
as in the statement of the lemma. 
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E. Proof of Lemma^ Proof: The proof is similar in spirit to the proof of 

„, , . , ■ ^1. r 11 ■ 1 u- u J Lemma |6] Taking a term in the summation (|30]l. we can bound 

We begin by proving the following lemma, which provides ' ^ — ' 

u J .u T^t \ It as follows: 

an upper bound on the quantity £/(7i, si). 

Lemma 9 (Lower Regime): If the three conditions dZSl l 

hold, then, for all si G { \e2n\ , . . . , [scrit^il } and 71 > 

7crit(e2)"-, there exists some 7I = 72(scrit,e2) > such that 



■B(si,7i,72) 



< 



- log£'(si,7i) < y^minjO, rfc(si,7i)} + T4(si,7i) + o(l'^,hich is upper bounded by 

n < ^ 

\an 



si 
\an 
si 



?^i(7i: 72)^^2(711 72) 
[^2(71,72), 



n 

where 



k=l 



Si 



'[\N{F)\N{Si)\^-l2 I |iV(5i)| ==7i,|5i| -si] 



Ti = -log 
n 



1 /(I - R)n\ 1 , 

' ' \ '--log 



To = 



n 
1 



■log 



71 

(l-i?)n-7i 
72 

72 



71 



{l-R)n 



Note that Q' term is upper bounded by 



■log 



72 + 7i 



{oLn — si)d-ii 



[{l - R)n\ - ( 72+71 
72 



T3 = -log, 

n Vmin|/i(si,7i,?;),72} 



1 



log 



n ^\{1-R)n 
71 +min{72,/3(si,7i,2/)}y^^^ 



L(l - R)n\ 

3 



( [an] —si)d.^ 



(33) 



so that ilog2B(si,7i,72) < E»=i C'^^i. 7i, 72) + o(l), 
where 



T4 = max max 

72 er y,ey^ 



71 + 72 



i=l 



C2 
C3 



aH 
H 



si/n 



12 1 n 



(1 - i?) - 7i/« 



where 

r 



(a - ^) I0g2 



72/^ + 71/71 



{r7>l, r72»'l + 1, . • . (an - si)} 



-up 



for i = 1, . 



Proof: 

We begin with the decomposition 



Using conditions (1251 ), since s^^t < the term Ci is 
increasing in si/n. Moreover, since adi, < '■^~"'^''~'^''^'^"* , 
second entropy term is the term C2 increasing in 71. Finally, 
the term C3 increasing in si/n and in 71/71. 

Consequently, i log2 i3(si, 71, 72) is upper bounded by the 
function 



where 



Q!7l 
Si 



du [an] —si 

^ [^1(71 7 72) [^2(71: 72) (30) 
72=1 



' a 



H 



7 



1 — i?) — (iiiScrit 

7 + dt,Scrit 



+ d,„(a-Scrit)log; 



(1-i?) 



^^1(71,72) 



J' [3 S2 with (51,^2) e ^2 

s.t. S(5i,52) ||iV(5i)| =71, 

|7V(F)\7V(5i)| =72,|^i| =.si], 
f/2(7i,72) :=Q'[|iV(5i)|=7i, 

|iV(F)\Ar(5i)|=72||5i| = .i] 



Note that lim^^o ^(7) < follows from the third condition 
in the series jZSl l. The remainder of the proof is entirely 
analogous to that of Lemma |6] ■ 



By Lemma [TOl it suffices to provide upper bounds for 
the terms ^(31,71,72) for si e { [e27i] , . . . , [scrit".] }, 
7i > 7crit(e2)'^ and 72 > 7271. Recall the bound i33[ on 
Q'[\N{F)\N{Si)\^l2 I |iV(5i)| =7i,|5i| = si]. Simi- 
and recall that Q' is the distribution Q conditioned on the larly, recall from Appendix|D]that Q' [|7V(S'i)| =7i|S'i| = si] 
event T(ei). We now require a lemma that allows us to restiict is upper bounded by 

appropriately the range of summation over to values of 72 that ^1 |^ / _ ^ d^.si 

scale linearly in n. 

Lemma 10: (72 small): The conditions of Lemma |9] imply 



L(l - i?)7lj 

71 



71 



V{l-R)i 



that there exists some value 73 = 72(sciit) > for which the Recalling the notation 5*1 for the fixed set {1, . . . , si}, he only 
quantity 



G(si,7i) 



\ari\ 



l72"J 

E C/i(7i,72)C^2(7i,72) (32) 
72=1 



decays exponentially in ?i for any si , 71 that satisfy [e27i] < 
si < [scritn-1 and 71 > 7crit(e2)'^- 



missing piece is an upper bound on 

3{SIS2)<eA2 s.t. B{Sl,S2) 

iV(5i*)|=7i, \N{F)\NiSl)\=i2 
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Recall that Q' is the conditional probability given the event 
{T(ei)}. In this space, every set 52 G F''{q) corresponds 
to a request vector y £ ni=i{07 --i r^i"^"!}- Moreover, for a 
set 5*2 G F''{q) and its corresponding request vector y, the 
event B{Sl, S2) is equivalent to the following condition being 
satisfied: 

B{Sl,S2)^\{N{S2)nN{F))-N{Sl)\<(3{s,,j^,y). 

Therefore, a union bound over all the possible choices of 
sets 5*2 gives the following upper bound for the probability 
of interest: 



Based on the preceding analysis, we can now complete 
our proof of Lemma |7] Indeed, using Lemmas |5] |6] and |9] 
we can upper bound ^\ogQ[B{A2) \ T(ei)]]. In this upper 
bound, all the relevant quantities (i.e. si, 71, 72, 2/i , 2/2: ■ ■ ■ , Vq) 
scale linearly with n. Therefore, standard bounds on binomial 
coefficients (|29]l lead to the claimed form of F. 



E 

1/1=0 



E 



yi 



F. Proof of Lemma |S] 

The last thing we need to do is bound the probability of the 
bad event Q'[i3(A3)] (Si small, 5*2 large). As usual, we do a 
imion bound over Si sets of various sizes contracting. Define 



Hyi,y2, ■ ■ . ,?/g,7i,72), 



where A(yi, . . . , y^, 71, 72) is the probability, under the distri- 
bution O', of the event 



\iNiS2) n NiF)) \ NiSDl < /3(si, 71, 



\N{Si)\^ji, \N{F)\N{Sl)\^^2j, 

where ^2 corresponds to request vector y. 

In order to complete the proof, we need a final observation. 

Lemma 11: For alH = 1, . . . , if 7i, 72 are fixed, 

then the function A(yi, y2, ■ • ■ , J/q, 7i, 72) is increasing in the 
scalar variable yi e |l, 2, . . . , [^^^J |. 



Proof: Clearly 



is increasing for yi G 



|l, . . . , [^^^2""] |- Therefore, it is enough to establish that 
the probability A(?;i, . . . , y„, 71, 72) is increasing for yi G 

{-up ^ 
1, . . . , [^^^2~"J I ■ ^^^^ ^^^^ follows from the same coupling 
argument used in Section IIV-AI for a variable j G F"^-, 
the number of requests Xj and the size of the intersection 
\N{i) n N{F)\ are positively correlated. Therefore, increas- 
ing the number of edges can only increase the probability 
A(?;i, . . . ,yg,7i,72) of the bad event B{Si,S2). ■ 

Using Lemma [TT] we can now conclude the proof of 
Lemma |9] Denote by yi -.^ | ^^^j" , • ■ • , fyi^"-! |' we 

have that i log HLi E!/!=o"^ A(?/i, ... ,2/5,71,72) is upper 
bounded by 

( 1 



E~^°g( ' ) +-logA(2/i,...,2/g,7i,72) 

Vi^y^ y^^n \ yi J n 

By union bound, the quantity i log A(yi, . . . , j/,, 71, 72) is 
upper bounded by 



1 . f\y'lM\ 1 



-log I 
n L 



72 



n I- \min{/3(si,7i,r),72}^ 
71 +min{/3(si,7i,r),72}y^'''' 



71 + 72 

Putting everything together yields the claim of Lemma |9] 



D'{si) 



\an\ 
si 



S2 C F^iq) with (5i, S'2) G ^3 



s.t. B{SitS2)\ Si is some fixed set of size si.]. 



and therefore 

1^2 nl 
si=l 

Intuitively, it should be clear that this is the easiest regime, 
because handling requests from Si is always harder compared 
to requests from 5*2 (because variables in S2 have fewer 
requests). We will make £2 small enough so that the requests 
from 5*1 are completely covered from the neighborhood of ^2 
(which is always larger than a linear fraction). The function we 
obtain is strictly dominated by F{a) for sufficiently small £2, 
as one would expect, since F{a) is satisfying the requests in 
a harder regime. We make a formal argument using continuity 
to establish this fact. 

For £2 sufficiently small, we have that, for all si G 
{l,...,\_^2n\}. 



\ari\ 



< 



\an~\ 



<n{aH 



The remainder of the analysis exploits the fact that for £2 
sufficiently small and any set ^2 of size at least pn, if y is the 
vector of requests from S2, then, with high probability, 

q 

\N{S2)n{N{F)\N{Si))\ > J2iy,+pe2n. 



/3'(e2,2/) 

In words, the neighborhood of set S2 inside N{F) \ N{Si) is 
sufficiently large not only to cover the requests from set S2 
but also from Si. We are going to bound the probability of 
failure, by only allowing ^2 to cover all the requests: 

Q'[3 S2 C F%q) with {Si, S2) G A3 
s.t. B{Si, S2) I 5*1 some fixed set of size si] < 
Q'[3 S2 C F'{q) with (^1,^2) G ^3 
s.t. \NiS2)n{N{F)-N{Si))\ < 
P'{^2, y)}] Si some fixed set of size si]. 



By similar analysis as in the proof of Lemma |9] we obtain 

ilogD'(5i) < F'(a,e2) + o(l), where 

F'{a,e2) ■■= sup sup G'(72, yi, y2, ■ ■ • , ^g, £2), 

72e[0, y.G[y7-/2,y7P] 

and the intermediate function 

2 

G' = G'(72, yi, £2) - ^ min {0, GK72, y)}+G:,(72, 

1=1 

has terms 

G-((l-i.)-.„e2)i/(^^^-^L) + 

, . . -, / dt,e2 +72\ 

d.(«-.2)l0g2 (^TT^l^j' 

^, ^ ^-^^ [^ minh-2,|(e2,y-)} ^ 

/ d..2+minq,_^-(e2,,-)} \ 

G-„H(2),|:„rH(^). 

Note that 

lim G'(72,yi, • . • ,^9,62) = _ lim G(si, 71, 72, yi, . . . , y,, £2), 

where the limit is taken by setting 71 = 8(si) (which will be 
true by concentration). Therefore, we have 

lim F'(a,e2) < F{a). 

£2— *0 

Consequently, if F{a) < 0, it then follows that 
limjj^o F'ict, £2) < 0. By continuity, there exists some value 
£2 > such that F'{a,e2) < 0; for this value of 62, the 
probabiUty Q[f?(A3) | T(ei)] decreases exponentially in n. 



